C/3 



o 
o 



X 



Applied Probability Trust (9 July 2010) 



ADAPTIVE ESTIMATION OF VECTOR AUTOREGRESSIVE 

MODELS WITH TIME- VARYING VARIANCE: APPLICATION 

P ■ TO TESTING LINEAR CAUSALITY IN MEAN 

(N 

00 



Valentin Patilea" and Hamdi Rai'ssi' 



fl 



(N 

>, 

^ '■ ° IRMAR-INSA & CREST-Ensai 

0\ 



b 



IRMAR-INSA 



First version January 2010 
This version July 2010 



* 20, avenue des buttes de Coesmes, CS 70839, F-35708 Rennes Cedex 7, France. Email: 
valentin.patilea@insa-rennes.fr and hamdi.raissi@insa-rennes.fr 

1 



Valentin Patilea and Hamdi Raissi 

Abstract 

Linear Vector AutoRegressive (VAR) models where the innovations could be 
unconditionally heteroscedastic and serially dependent are considered. The 
volatility structure is deterministic and quite general, including breaks or 
trending variances as special cases. In this framework we propose Ordinary 
Least Squares (OLS), Generalized Least Squares (GLS) and Adaptive Least 
Squares (ALS) procedures. The GLS estimator requires the knowledge of 
the time-varying variance structure while in the ALS approach the unknown 
variance is estimated by kernel smoothing with the outer product of the 
OLS residuals vectors. Different bandwidths for the different cells of the 
time-varying variance matrix are also allowed. We derive the asymptotic 
distribution of the proposed estimators for the VAR model coefficients and 
compare their properties. In particular we show that the ALS estimator is 
asymptotically equivalent to the infeasible GLS estimator. This asymptotic 
equivalence is obtained uniformly with respect to the bandwidth(s) in a given 
range and hence justifies data-driven bandwidth rules. Using these results we 
build Wald tests for the linear Granger causality in mean which are adapted 
to VAR processes driven by errors with a non stationary volatility. It is also 
shown that the commonly used standard Wald test for the linear Granger 
causality in mean is potentially unreliable in our framework. Monte Carlo 
experiments illustrate the use of the different estimation approaches for the 
analysis of VAR models with stable innovations. 

Keywords: VAR model; Heteroscedatic errors; Adaptive least squares; Ordinary 
least squares; Kernel smoothing; Linear causality in mean. 
JEL Classification: COl; C32 



1. Introduction 

In the recent years the study of linear time series models in the context of uncon- 
ditionally heteroscedastic innovations has become of increased interest. This interest 
may be explained by the fact that numerous applied works pointed out that uncon- 
ditional volatility is a common feature in economic data. For instance Doyle and 
Faust (2005), Ramey and Vine (2006), McConneU and Perez-Quiros (2000), Blanchard 
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and Simon (2001) among other references, pointed out a declining volatility for many 
economic data since the 1980s. Sensier and van Dijk (2004) fomid that 80% of 214 
U.S. macroeconomic time series they considered exhibit a break in volatility. 

In the univariate time series case Busetti and Taylor (2003), Cavaliere (2004), 
Cavaliere and Taylor (2007) and Kim, Leybourne and Newbold (2002) among other 
references, considered the test of unit roots with non stationary volatility, while Sanso, 
Arago and Carrion (2004) proposed tests to detect volatility breaks in the residuals. 
Robinson (1987) and Hansen (1995) studied univariate linear models with a non sta- 
tionary volatility. Phillips and Xu (2005) investigated the Ordinary Least Squares 
(OLS) estimation of univariate stable autoregressive processes. Xu and Phillips (2008) 
considered the same model and proposed an Adaptive Least Squares (ALS) approach 
which are based on nonparametric estimation of the volatility of the innovations using 
OLS residuals. The main conclusion of Xu and Phillips (2008) is that the ALS 
estimating approach could be much more effective than the OLS estimation. They also 
found that the asymptotic behavior of the ALS estimator does not dependent on the 
volatility structure. Multivariate processes are often used in econometric applications 
because they allow to study cross-correlations between variables. In the multivariate 
framework Boswijk and Zu (2007) and Cavaliere, Rahbek and Taylor (2007) studied 
cointegrated systems in presence of non stationary volatility. 

In this paper we study the inference in linear vector autoregressive (VAR) models 
with volatility changes and possibly serially dependent innovations. Three methods 
for estimating the VAR coefficients are investigated: OLS, infeasible Generalized Least 
Squares (GLS) based on the knowledge of the time-varying volatility structure, and 
ALS which is defined like the GLS but using a kernel estimate of the volatility structure. 
The kernel smoothing could be used with a single bandwidth for the whole volatility 
matrix or with different bandwidths for different cells. In some sense, we extend the 
approach of PhiUips and Xu (2005) and Xu and Phillips (2008) to the VAR framework. 
In particular, we see that in the multivariate case the asymptotic distribution of the 
GLS and ALS estimators is no longer free from the time-varying volatility structure. 
Moreover, our asymptotic results are uniform with respect to the bandwidth in a given 
range. This opens the door to data-driven choices of the smoothing parameter, for 
instance by cross-validation. Such uniformity results seems new even for the univariate 
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case. 

As an application of the new estimation methodology, we also consider the problem 
of test linear causality in mean. The linear causality in mean, introduced by Granger 
(1969), is often used to investigate causal relations between subsets of variables. For 
instance Sims (1972), Feige and Pearce (1979) or Stock and Watson (1989) studied 
the money-income causality relation. Bataa et al. (2009) studied the links between 
the inflations of different countries by testing linear causality relations. This can be 
explained by the fact that linear causality in mean can be easily tested by considering 
tests of zero restrictions on the parameters of VAR models. However, the existing test 
procedures for checking the linear causality in mean are based on the iid innovation 
assumption, while several empirical analysis contradict this setting. For instance, Bataa 
et al. (2009) underlined the presence of volatility breaks in their data set. In this paper, 
we use our theoretical results on the OLS and ALS estimation to propose new Wald 
tests for linear causality in mean adapted to the framework of non-stationary volatility. 
The asymptotic chi-square distribution of the new Wald type statistic obtained from 
the ALS approach is derived uniformly with respect to the bandwidth(s). 

The structure of the paper is as follows. Section [5] outlines the heteroscedastic VAR 
model, introduces the assumptions and the definitions of OLS and GLS estimators. 
Section [3] contains the results on the asymptotic behavior of the OLS and the infeasible 
Generalized Least Squares estimators. We also propose an estimator for the asymptotic 
variance of the OLS estimator. The ALS estimator based on kernel smoothing of 
OLS residuals is proposed in Section |4] as a feasible asymptotically equivalent version 
of GLS estimator. The asymptotic equivalence between ALS and GLS estimators is 
proved uniformly in the bandwidths involved in volatility estimation. To prove this 
equivalence we use, among other technical arguments, a recent version of a uniform 
CLT for martingale differences arrays obtained by Bae et al. (2010), Bae and Choi 
(1999). A procedure for estimating the asymptotic variance of the ALS estimator is 
also provided. The application of the new inference methodologies to the test of the 
linear Granger causality in mean in the presence of time- varying volatility is presented 
in Section m The benefit from using our new Wald type test statistics and the failure 
of the classical Wald test designed for iid innovations is illustrated through an example. 
In section |6] the finite sample properties of the different tests considered in this paper 
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are studied by mean of Monte Carlo experiments. The better precision of the ALS 
estimator when compared to the OLS estimator is also highlighted. The proofs are 
relegated to the appendix. 

The following notations will be used throughout in the paper. We denote hy A(E) B 
the Kronecker product of two matrices A and B, and A (E) A hy A®^. The vector 

obtained by stacking the columns of A is denoted vec{A). The symbol =^ denotes the 

p 
convergence in distribution and we denote by — > the convergence in probability. We 

denote by [u] the integer part of a real number u. The determinant of a square matrix 

A is denoted by det A. 

2. The model and least squares estimation of the parameters 

Let us consider the observations X_p+i, . . . , Xq, Xi, . . . , Xt generated by the fol- 
lowing VAR model 

Xt ^ AiXt^i + ■ ■ ■ + ApXt-p + ut (2.1) 

ut = Httt, 

where the Xt^s are d-dimensional vectors. The stability condition on the matrices Ai, 
det A{z) 7^ for all l^l < 1 with A{z)^Id — X^iLi^*-^* ''^^'^ ^d denotes the dx d identity 
matrix, is assumed to be hold. For a random variable x we define || x \\r= [E \\ x W^)^'^ ^ 
where || x \\ denotes the Euclidean norm. We also define Ft as the cr-field generated 
by {es : s < i}. The following assumption on the iJ^'s and the process (et) gives the 
framework of our paper. 

Assumption Al: (i) The d y. d matrices Ht are invertible and satisfy -ff[Tr] == 
G{r), where the components of the matrix G{r) := {gki{f)} are measurable deter- 
ministic functions on the interval (0, 1], such that sup^g/Q l^ \gki{i')\ < oo, and each g^i 
satisfies a Lipschitz condition piecewise on a finite number of some sub-intervals that 
partition (0, 1]. The matrix I](r) = G{r)G{r)' is assumed positive definite for all r. 
(ii) The process (e^) is a-mixing and such that E{€t \ J-t-i) = 0, E{ete'f. \ J-t-i) = Id 
and the components e^t of the process [tt) satisfy supj || ekt ||4/i< oo for some /i > 1 
and all fc G { 1 , . . . , d} . 
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The assumption Al generalizes the assumption of Xu and Phillips (2008) to the 
multivariate case. From the assumption E{et \ J't-i) = 0, the innovations are possibly 
serially dependent. However since G{r) is deterministic and E{ete[ \ J-t-i) = Id, we do 
not allow the error process to follow a multivariate GARCH model. Cavaliere, Rahbek 
and Taylor (2007) considered similar volatility structure to ours. Their assumption is 
slightly different from Al in the sense that they do not require a Lipschitz condition 
and allow for a countable number of jumps. Boswijk and Zu (2007) allow the matrix 
Ht to be possibly stochastic, but requires the volatility process to be continuous with 
other additional assumptions, which in particular excludes important cases like abrupt 
shifts. Hafner and Herwartz (2009) assumed no structure on the volatility of the error 
process (ut) and allow for conditional heteroscedasticity. Nevertheless their framework 
excludes the use of information on the volatility structure and could result in a loss of 
efficiency in the statistical inference of the model. In addition Hafner and Herwartz 
(2009) also assumed 

T T 

lim T-i V Et = S, and lim T'^ V E{{Xt-iX[^^) ® (uti4)} = VF, 

t=l t=l 

where St = E{utu[), Xt-i = (Xt'_i, . . . ,Xt'_p)' e W^ and W, t are positive definite 
matrices, and this could be viewed as too restrictive. If we suppose that the volatility 
matrix Ht is constant, we retrieve the standard homoscedastic case. However the 
assumption of standard errors is often considered to be too restrictive for macroeco- 
nomic or financial applications. Indeed many applied studies pointed out that such 
data may display unconditional non-stationary volatility (see e.g. Kim and Nelson 
(1999), Warnock and Warnock (2000) or Batbekh et al. (2007)). Starica and Granger 
(2005) found that when large samples of stock returns are considered, taking into 
account shifts for the unconditional volatility instead of assuming a stationary model 
as a GARCH(1,1) improve the volatility forecasts. 

Let us denote by 9o = (vec (Ai)', . • ■ : vec (Ap)')' € MJ"^' the vector of the true 
parameters. The equation (j2.ip becomes 

Xt = {Xt^i(E)Id)Oo+ut 
Ut = Htet, 
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where we keep the notation Xt-i ~ {X^_^, . . . ^X[ )' . Usmg this expression we first 
define the OLS estimator 

OoLS - S^'vec (tx) , 
where 



T T 

-i^Xt_il,'_i®/d and Sx=r-i^^ 



Next, let us define the unconditional variance St := HtHj. and the Generalized Least 
Squares (GLS) estimator that takes into account a time- varying E4, that is 



'^GLS — S^ 



Vec (±x) , 



(2.2) 



with 



% = T-^ J2 ^t-iX't-i ® Ej-i and ±x = T'^ J2 ^^^^tX^-i- 



Note that since Ht is assumed invertible, St is positive definite for all t. If we suppose 
that the volatility matrix St is constant in time, it is easy to see that Og^s = (^ols- 
However the GLS estimator is in general infeasible since the true volatility matrix 
appears in the expression p.2p . In the next section we compare the efficiency of the 
OLS and GLS estimators. 



3. Asymptotic behaviour of the estimators 

In order to state the first result of the paper, we need to introduce the following 

notations. Since we assumed that det ^(2;) ^0 for all |z| < I, it is well known that 

00 

Xt = ^ ViUt-i, (3-1) 

i=0 

where -00 = Id a-nd the components of the '0i's are absolutely summable (see e.g. 

Liitkepohl (2005, pp 14-16)). From the expression (|3.fp we also write 

00 

i=o 
u^ is given by u^ = Ip wt, where Ip is the vector of ones of dimension p and 

^ V* \ 

V»-i 

■•. 

\ 0,_p+i J 
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taking ipj = for j < 0. Let us define by Ipxp the px p matrix with components equal 
to one. The following proposition gives the asymptotic behavior of the OLS and GLS 
estimators. For the sake of brevity we only investigate the asymptotic normality, the 
consistency is in some sense an easier matter and is hence omitted. 

Proposition 3.1. If Assumption Al holds true, then: 



where 



„1 oo 
•^0 ,.-_n 



4=0 



is positive definite; 



T^(^OLS -^o)^AA(0, A3 IA2A3-I), 



whe 



and 



A2 = 



„i 00 
/ ^{^,(lpxp0SW)V54}®S(r)dr 

•^ .- n 



A3 = / ^ [Mlpxp <^ ^{rW,j <E) Id dr 



(3.2) 



(3.3) 



are positive definite; 

3. The asymptotic variance of 9gls is smaller than the asymptotic variance ofdoLS, 
that is the matrix A^ A2A;^ — A^ is positive semidefinite. 

If we suppose that the error process is homoscedastic, that is Ej = S]„ for all t, and 
since we assumed Eittt'^ \ J-t-i) — Id, we obtain 



Al = £; 



XtX[ 



'K\ ^2^E 



XtX[ 



S„ and A-i = E 



XtX[ 



so that we retrieve the standard result of the iid case (see e.g. Liitkepohl (2005, p 74)) 
Ar' = A3-IA2A3-1 = {E[XtX[]]-^ ® S„, (3.4) 
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although here the error process is assumed dependent. Note that in the homoscedastic 
case the OLS and ALS estimator have the same efficiency. 

In the univariate case {d = 1), S(r) belongs to the real line so that Ai simplifies to 

oo 

Ai = ^f/'ilpxpV'i, (3-5) 

4=0 

where the ipi^s are p x p diagonal matrices. This expression corresponds to the asymp- 
totic covariance matrix obtained in equation (10) of Xu and Phillips (2008). Moreover, 

^1 oo ^1 oo 

A2 = / I](r)^dr V] •^^ilpxpV'i f , A3 = / I](r)dr Y] I V^ilpxp^^ ' 

and then we retrieve equation (5) in Xu and Phillips (2008). 

A nice feature of the GLS estimator in the univariate case is that the covariance 
matrix of the asymptotic distribution does not depend on the volatility function S(r). 
In the multivariate case the simplification (|3.5|) is still possible if I](r) ~ cr^(r)/d, with 
cr^(r) a scalar function. Nevertheless, we show in Example 13.11 below that p.Sp does 
not hold in the general multivariate framework and the asymptotic covariance matrix 
in (|3.2p depends on the volatility function S(r). Moreover, our example shows that 
the covariance matrices in p.2p and p.3p can be equal in some particular cases of 
heteroscedasticity but in general they could be very different. 

Example 3.1. Consider the bivariate model 112. 1\) with p = 1 and 

, ai \ ( Ei(r) 

^1- , S(r) = 

02 ; \ I]2(r) 

In this simple case let us compare the asymptotic variances 

Varas (^2,GLs) = (1 - a?) x (j ^i{r)/^2{r)dr 

and 

\r (a \ (^ 2n ^ I /o^ ^1(^)^2 (Qdr 

Varas [02,OLs) = (1 - fli) X < —. -2- 

[ {Sl^i{r)dr) 

that is the asymptotic variances of the GLS and OLS estimators of the second com- 
ponent of the vector 6q = (ai, 0,0,02)' (which corresponds to the element (2, 1) of the 
matrix Ai). 
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First we notice that VaVas ( ^2,gls ) depends on the volatility structure when Si(r) 7^ 
E2(r). In order to illustrate the difference between the variances of 92,oLS o,nd 92,gls> 
we plot the ratio 

Varas (e2,oLs) I Varas (^2,gls) (3.6) 

in Figure 17. Jl taking 
Si('') = c^?o + (^11 - ^w) X l[ri.i](^) and T.2{r) = aj^ + {a^^ - a^g) x l[r,,s]{r), 

where and Ti G [0,1] with i G {1,2}. This specification of the volatility function is 
inspired by Example 1 of Xu and Phillips (2008) (see also Cavaliere (2004-)). On the 
left graphic we take ti = T2 and af^ = (t|q = afi = 1 but a^i > 1, so that only 
{X2t) is heteroscedastic in general. When fT|j^ = 1 or ti G {0,1}, the process (Xt) is 
homoscedastic. On the right graphic we take ctJq = ct^q — 1 and afi = (t|]^ = 3 but 
Ti 7^ T2 in general. When ti = T2, we have Si(r) = S2(r) and hence we retrieve the 
case studied in Example 1 of Xu and Phillips (2008). 

As expected the ratio liS. 6]} is equal to one in the homoscedastic case in the left 
graphic. However, departure from this case clearly shows that the difference between 
the variances of the two estimators is increasing with a2i . In the right graphic we 
can see that when T2 ~ or 1 the ratio in 113. 6\} is equal to one although {Xt) is 
heteroscedastic. The variances Varas{d2,OLs) o,nd Varas{02.GLs) o,re different when 
T2 G (0,1) and the largest relative difference is attained when we set the volatility shifts 
in the middle of the sample. 

It appears that the GLS estimator is more efficient than the OLS estimator in general 
when the matrix St is time-varying. Nevertheless the assumption of known volatility 
structure needed to construct the GLS estimator could be unrealistic in practice. 
Moreover, the asymptotic distribution of the GLS estimator depends on the unknown 
volatility. In the OLS estimation approach only the asymptotic distribution of the 
coefficients estimator depends on the unknown volatility. In addition, we can provide 
simple consistent estimators of A2 and A3, which could be further used for instance to 
build confidence intervals for the OLS estimators. For the purpose of estimation of A2 
and A3 let us consider the matrices 0,2 := /q ^{r)®'^dr, Vij, := /„ Yi{r)dr and denote 
the OLS residuals by ut. 
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Proposition 3.2. Under Assumption Al we have 

T 



fl2--T ^Y^ Ut-lK-l ® Utu't = fl2 + Op(l), 
t=2 

T 
T 

A2 := T-i ^Xt_iXt'_i ® ut?i; = A2 + Op(l). 



(3.7) 

(3.8) 

(3.9) 
(3.10) 



A3 :=S^ = A3 + 0p(l), 

Using (|3.7p and p.Sp and some additional algebra, we can define alternative consis- 
tent estimators of A2 and A3. Indeed, it is shown in the appendix that 



(A2) = {V2)2-(A®/rf)®2}- 



f^2 



Od^x{p-l)d^ 



0(p-l)d2xd2 0{p-l)d^x{p-l)d2 



and 



vec 



(A3) = {/(p,.).-(A®/,)«2|- 



vec 



^3® Id 0^2 x(p_ 1)^2 

0(j)-l)d2xd2 0(p_l)d2x(p_l)d2 

where Oti2x(p-i)(i2 is the null matrix of dimension cP x {p — l)cP and 

f Al ... Ap_i Ap \ 
Id ... 



A = 



V 



Id J 



(3.11) 



(3.12) 



is a matrix of dimension pd x pd. Therefore replacing Q2 and ^^3 by respectively CI2 and 
Ct^, and the ^'^s by their OLS estimates in the expression of A in (|3.1ip and p.l2p . we 
obtain consistent estimators of A2 and A3. These estimators will be denoted by A2S 
and A35, where the subscript S refer to the use of the OLS estimator of A. 



4. Adaptive estimation 

In the previous section we pointed out that the GLS estimator is generally infeasible 
in applications. Therefore we consider a feasible weighted estimator obtained using 
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nonparametric estimation of the volatility function. Our approach generalizes the 

work of Xu and Phillips (2008) to the multivariate case. Let us denote hy Aq B the 

Hadamard (entrywise) product of two matrices of same dimension A and B. Define 

the symmetric matrix 

T 

4=1 

where, as before the u^'s are the OLS residuals and the fc/— element, fc < L of the dx d 
matrix of weights wu is given by 

wti{bki) = '^Ku{bki) Kti{bki), 

with bki the bandwidth and 

f K(^) if t^i, 

[ a t = i. 

The kernel function K{z) is bounded nonnegative and such that J_ K{z)dz = 1. For 
all 1 < k < I < d the bandwidth b^i belongs to a range Bt = [cminbr t c-maxbr] with 
CmimCmax > somc coustauts and 67^ ^ at a suitable rate that will be specified 
below. 

When using the same bandwidth bki € Bt for all the cells of S°, since Ui, i = 
1, ..., T are almost sure linear independent each other, S^ is almost sure positive definite 
provided T is sufficiently large. A similar estimator is considered by Boswijk and Zu 
(2007). When using several bandwidths bki it is no longer clear that the symmetric 
matrix E^ is positive definite. Then we propose to use a regularization of S" , that is 
to replace it by the positive definite matrix 

1/2 



{{t<if + .Th} 



where i/^ > 0, T > 1, is a sequence of real numbers decreasing to zero at a suitable rate 
that will be specified below. Our simulation experience indicates that in applications 
with moderate and large samples ux could be even set equal to 0. 

In practice the bandwidths bki can be chosen by minimization of a cross-validation 
criterion like 

T 

Y^ II tt-utii't IP, 
t=i 
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with respect to all b^i S Bt, ^ i£ k < I < d, where || • || is some norm for a square 
matrix, for instance the Frobenius norm that is the square root of the sum of the 
squares of matrix elements. Our theoretical results below are obtained uniformly with 
respect to the bandwidths bki € Bt and this brings a justification for the common 
cross-validation bandwidth selection approach in the framework we consider. To our 
best knowledge, this justification is new and hence completes previous procedures of 
Xu and Phillips (2008) and Boswijk and Zu (2007). 

Let us now introduce the following adaptive least squares (ALS) estimator 

Oals = t^vec (tx) , 
with 

T T 

t=i t=i 

Assumption Al': Suppose that all the conditions in Assumption Al(i) hold true. 
In addition: 

(i) infrg(o,i] A„i,;„(S(r)) > where Amm(r) denotes the smallest eigenvalue of the 
symmetric matrix F. 

(ii) supi llffctlls < oo for all k e {l,...,d}. 

Assumption A2: (i) The kernel A'(-) is a bounded density function defined on 
the real line such that A'(-) is nondecreasing on (— oo, 0] and decreasing on [0, oo) and 
Jg^v'^K{v)dv < oo. The function K{-) is differentiable except a finite number of points 
and the derivative K'{-) is an integrable function. Moreover, the Fourier Transform 
T[K]{-) of K{-) satisfies /^ \sJ^[K]{s)\ ds < oo. 

(ii) The bandwidths bki, 1 < k < I < d, are taken in the range Bt = [cminbT, Cmax^T] 
with < Cmin < Cmax < OO and Bt + 1/Tbrp "^ —^ Q a,s T ^ oo, for some 7 > 0. 

Assumption Al' and A2(ii) are natural extensions to the multivariate framework 
of the assumptions used in Theorem 2 of Xu and Phillips (2008). The conditions on 
the kernel function are convenient assumptions satisfied by almost all commonly used 
kernels. These conditions allow us for simpler technical arguments when investigating 
the rates of convergence uniformly with respect to the bandwidths. The condition on 



14 Valentin Patilea and Hamdi Raissi 

the sequence bx, T > 1, is slightly more restrictive than the one imposed by Xu and 
Phillips (2008) in the univariate case, that is br + l/Tb"^ — > 0, and this is the price we 
pay for obtaining the results uniformly in the bandwidths in a range Bt- 

Let ill '■— /n 5](r) (X) I](r)^^(ir. In the sequel, we say that a sequence of random 
matrices Ay, T > 1 is Op(l) uniformly with respect to (w.r.t.) 6^; G Bt as T — > 
CX3 if supj^<;j,<;;<^ sup[, gg ||vec (Ay) II — > 0. The following proposition gives the 
asymptotic behavior of the adaptive estimators uniformly w.r.t the bandwidths. 

Proposition 4.1. Under Al' and A2 and provided Tv^ — > 0, uniformly w.r.t. hki € 
Bt as T —> oo 

Al :=S^ = Ai+Op(l), 

T 

fii := T-^ ^tt® t^^ =ni+ Op(l) 
and 

Vf{eALS-0GLs) = Opil). 

Proposition 14. II shows that the ALS and GLS estimators have the same asymptotic 
behavior, that is the ALS estimator is consistent in probability and vT— asymptotically 
normal as soon as the GLS estimator has such properties. The results remains true 
even if the bandwidths 6^; G Bt are data dependent. 

On the other hand, similarly to (|3.1ip and p.l2p . 

vec(Ai) = {/(,,.). - {A®I,ry\ec ( ""' 0,..(,_i),. \ ^^^^^ 

Then we also obtain an alternative consistent estimator (uniformly w.r.t. bki € Bt) 
Ais of Al by replacing fti by Hi, and the A'^^s by their ALS estimates in the expression 
of A in (gH). 

5. Application to the test of the Unear Granger causaUty in mean 

In this section we propose tests for linear causality in mean in our framework using 
the OLS and the adaptive approaches. Let us consider the subvectors Xu and X2t 
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such that Xt — {X[^,X2ty where Xu is of dmiension di < d. and d2 ^ d — di. It is 
said that {X2t) does not cause hnearly (Xu) in mean if we have 

EL(Xit I Xit^i, • • ■ ) " EL{Xit I Xit^i,X2t-i, ■■■)■, 

where EL{Xit | . . . ) is the hnear conditional expectation. In our framework since we 
assumed that (ej) is a martingale difference, the linear predictor is optimal. Therefore 
we have EL{Xit \ . . • ) = E{Xit \ ■ ■ ■), where E{Xit | . . . ) is the conditional expecta- 
tion, and we simply refer to the linear Granger causality in mean as Granger causality 
in mean in the sequel. We test the null hypothesis that (X2t) does not Granger cause 
{Xit) in mean. It is well known that this amounts to test the null hypothesis that 
^i,i2 = for all 1 < i < p versus the alternative that there exists i E {1, . . . ,p} such 
that Ai^i2 7^ 0, where the Ai^i2^s are the matrices given by the di first rows and d2 
last columns of the A^'s (see e.g. Liitkepohl (2005)). Define the block diagonal matrix 
R = diag(C, . . . ,C) of dimension pdid2 x pd^, where C is a did2 x d^-dimensional 
matrix given by 

OdiXdid Idi Odjxda 

c= ••. ■•. 

V Id, Od.xd, J 

The matrix R is such that we have ROq ~ r with r is the null vector of dimension 
pdid2 under the null hypothesis. Therefore the tested hypotheses can be written as 

Ho : i?6'o == vs. Hi : R0o ^ 0. 

In this paper we focus on the Wald type tests, because they are the most commonly 
used tests by the practitioners. We first consider the ALS estimator to build tests for 
Granger causality in mean. Let us introduce the adaptive Wald test statistics 

Qals = Te'ALsR'{RK-^R')-'R0ALS and Q^^^ = TO'^lsR^RKIR'T^R^als- 

The following proposition gives the asymptotic distribution of the ALS test statistics 
as a simple consequence of Proposition l4.1l We say that a sequence of random variables 
At, T >1, converges in law to a chi-square distribution x^i uniformly w.r.t. bki G Bt 
as T — > oo, if there exists a sequence of random variables At, T > \, independent of 
hki G Bt such that At => Xn and At — At = Op(l) uniformly w.r.t. hki G Bt- 
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Proposition 5.1. Under the assumptions of Proposition [7T7| uniformly w.r.t. bki G 
Bt as T —> oo 

QALS^xld,d2^ (5-1) 

Qals => Xld.d, (5.2) 
and 

QT^s = maxIQ^is, Q'als) ^ X^d, ■ (5.3) 

Based on Proposition 15.11 we propose the following procedure for testing Granger 
causality in mean: for a fixed asymptotic level a, reject the null hypothesis T-Lq if 
Xld^d2^-a < QTls^ where xld^d2^-a is the (1 - a)th quantile of the xld^d2 ^^w. 
Similar procedures could be defined using Qals or Q^als instead of Q™£5, but the 
latter statistic is expected to yield a more powerful test. The tests based on the ALS 
estimation will be denoted WalSi '^als ^^"^ ^als with obvious notations. 

Let us now consider the following Wald test statistics based on the OLS estimation 

QoLS — TOqlS^ (^^3 ^2 A3 R ) ROoLS, 
QoLS = TO'Q^gR'{Rk~g J^25J^'^s R')~^R(^OLS, 

and the commonly used standard Wald test statistic 

Qs = T9'oLsR'iRJ~^RT^ReoLs, with j= Ir-^Y.^^-^^t-il'^^s^- 

The following proposition gives the asymptotic behavior of the OLS and standard test 
statistics. 

Proposition 5.2. Under Al we have as T —> 00 

QoLS^xld^d2^ (5.4) 

QoLS "^ Xpdirfa' i'^-^) 

Q'STs = max{goLS, Q'ols} ^ X^d,, (5.6) 

and 

pd\d2 

Qs ^ Z{5) := J2 ""-^^^ (5-^) 

i=l 
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where the Zi 's are independent A/'(0, 1) variables, 5 ~ (ki, . . . , Kpdid2)' is the vector of 
the eigenvalues of the matrix 

* = {RJ-^R')-^RA^^A2A^^R'){RJ-^R')-^, (5.8) 

with 



f ( ~i 

•^ „- n 



'0 t=0 

It is easy to see from p.Sp and p.lOp that J is a consistent estimator of J. The 
results (|5.4p . (|5.5p and (|5.7p are direct consequences of Proposition l3.1l and l3.2[ so that 
the proof is omitted. In the Appendix we only give the proof of (|5.6p . Similarly to the 
tests built using the ALS approach, tests using the results (|5.4p , (|5.5p and (|5.6p can be 
proposed. 

When the errors are homoscedastic (E4 = I]„ for all t), we obtain J = E[XtXf] (g) 
E,^^. Recall that in this case we also have A3"^A2A3"^ = {_E[XtX(]}~^ ® E„, so that we 
obtain 4* = /p^ido ^-^^d hence we retrieve the standard result Qg => xtd d ■ However the 
Ki's in (|5.7p can be quite different from 1 if the volatility of the errors is not constant 
as illustrated in the following example. 

Example 5.1. Consider the bivariate VAR(l) process Xt — AXt^i + Ut with true 
parameter A ~ 0. Such a model may be used to test Granger causality in mean between 
the components of an uncorrelated process. Like in Example \3.1l let us take 

( Si(0 

E(r) = 

\ E2(r) 

Suppose that one is interested in testing if {X2t) Granger causes {Xu) in mean. Then 
R = (0, 0, 1, 0) and the matrix '^ is a scalar such that 

*= ( /" Ei(r)drj x(j ^2{r)dr\ x j Ei(r)E2(r)dr. 

As a consequence the sum in 15. 7)J reduces to a single term corresponding to the 
coefficient ki = ^. If we suppose that the error process is homoscedastic, we obtain 
Ki = 1. However in the general heteroscedastic case we have ki 7^ 1. To illustrate this 
let us take 
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and 

^2{r) = cr^o + ('^21 - 0-20)''^ 

as in Example 2 of Xu and Phillips (2008). The values of ki are plotted in Figure 
\7.2\ for q = 1, aiQ = G20 = 1 and crfi,a2i G [0.25,16]. It can be seen that in the 
heteroscedastic case ki can be quite different from 1 and therefore in this case using 
the standard Wald procedure based on Qs for testing if {X2t) Granger cause {Xu) in 
mean could be quite a bad idea. 

The tests based on the results (|5.4p . (|5.5p and (|5.6p will be denoted Wqls, Wqls^ 
WqI^, and the standard test based on the statistic Qs and the xtd d, distribution will 
be denoted Ws. 

6. Monte Carlo experiments 

The finite sample properties of the OLS, GLS and ALS estimating approaches for 
VAR analysis are illustrated in this section. Bivariate AR(1) processes are simulated 
using the model Xt ~ AXt-i + ut with 

A=\ "" °'' I and ut = Htet, et^Af{0,l2) M, (6.1) 

^21 022 

and taking 021 = 0.1 in all the experiments. In addition if 012 7^ note that 
iX2t) Granger causes (Xit) in mean. The errors are iid standard Gaussian in the 
homoscedastic case. We considered this case to study the consequences of the use of 
the methods for VAR models analysis introduced in this paper while the innovations 
process is in fact homoscedastic. In the heteroscedastic case the volatility structure is 
given by 

( (l + 7ir)(l+p2) p(l+7ir)5(l+72r)^ 

y p(l + -fir) 2 (1 + 72r) 2 (1 + 72r) 

so that the variances of the error components have a trending behaviour. In the sequel 
we set p ~ 0.6 and 71 = 20, 72 = 7i/3. For the ALS approach the bandwidth is 
chosen by cross-validation in a given range as described in A2, and we take vt = 
in all the experiments. In the sequel the results for the GLS estimation are given 
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for comparison only since this method is not feasible in practice. In each experiment 
N = 1000 independent trajectories are simulated using (|6.ip . 



We first examine the properties of the estimation methods presented in the previous 
sections. The Root Mean Squared Error (RMSE) of the OLS, ALS and GLS methods of 
the autoregressive parameters is considered in Figures 17.31 to [7771 In these experiments 
only ail = 0-22 vary and we set 021 =0.1, ai2 = 0. The length of the simulated series is 
T = 100. For homoscedastic errors we only give the results for the estimators of an in 
Figure 17.31 As expected the infeasible GLS estimation outperforms the other methods 
in all cases except when the errors are homoscedastic. In this case (Figure 17. 3p the 
different estimation methods are equivalent and hence give similar results. This can 
also be explained by the fact that the ALS based methods choose large bandwidths 
when the errors are homoscedastic and are then similar to the OLS estimation. However 
in presence of heteroscedasticity (Figures 17.41 to 17. 7p the ALS procedure clearly better 
estimate the autoregressive parameters when compared to the OLS estimation. 

In this part we study the empirical size of the Wald tests under comparison. There- 
fore we take ai2 = 0, so that {X2t) does not causes {Xu) in mean. We set an ~ 022 ~ 
0.2. The simulated processes are of lengths T = 50, T = 100, T = 200 and T = 400. 
We test the null hypothesis 012 = at the asymptotic nominal level 5% in Tables 
[T][2l Several other values of the autoregressive parameters and specifications of the 
heteroscedasticity not reported here were experimented, and lead to similar conclusions 
to that of the presented cases. Since N = 1000 replications are performed and assuming 
that the finite sample size of the tests is 5%, the relative rejection frequencies should 
be between the significant limits 3.65% and 6.35% with probability 0.95. Then the 
relative rejection frequencies are displayed in bold type when they are outside these 
significant limits. We first compare the Wqls, Ws, Wals, and Wgls- In Table [T] 
the homoscedastic case is considered. It emerges that the Wgls test outperforms the 
other tests for T = 50. We also remark that the Ws test have better results than the 
Wals and Wqls tests for T = 50. In general the relative rejection frequencies of the 
different tests are quickly close to the asymptotic nominal level (T = 100). Therefore 
in case of doubt of the presence of unconditional heteroscedasticity in the error terms. 
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one can use the ALS and OLS tests without a major loss of efficiency. In Table [5] we 
use heteroscedastic processes. In accordance with our theoretical results, the 11^5 test 
is not valid. The relative rejection frequencies of the Wqls, VVals and Wgls tests 
converge to the asymptotic nominal level as the samples increase. However we note 
that the Wals test have better results than the Wqls test for T = 50. It also appears 
that the infeasible Wgls test have a better control of the error of first kind than the 
other tests for T = 50. From Tables [1] and [2] the tests Wq^q, W^^^ and W^^g are 
more liberal than the other tests for small samples. 

A further set of Monte Carlo experiments has been conducted to analyze the empiri- 
cal power of the studied tests. We again simulate N ~ 1000 independent trajectories of 
bivariate AR(1) processes obtained using (|6.ip . where we set an = 022 = 0.2. We take 
0.12 7^ 0, such that the stability condition in (|2.ip is hold. Hence (X2t) is Granger causal 
in mean for (Xu) in this part. The null hypothesis ai2 = is tested at the asymptotic 
nominal level 5%. We only consider samples of length T = 100. The results are given 
in Table[3]for the homoscedastic case and in Table|3]for the heteroscedastic case. From 
our simulation results we remark that the different tests have the same power in the 
homoscedastic case. In the heteroscedastic case the Wqls tests are more powerful 
than the other tests. It emerges that the ALS tests are more powerful than the OLS 
tests in presence of unconditional heteroscedasticity. This can be explained by the fact 
that the ALS tests are slightly more sophisticated than the OLS tests. We also note 
a substantial gain of power for the WqI^, W^Jils ^^d WqI^ when compared to the 
other tests. 



We can draw the conclusion that when the process is non stationary but stable, the 
ALS estimation procedure give significant improvements in the estimation of VAR 
models when compared to the standard OLS estimation method. We also noted 
significant improvements of the ALS based tests in the analysis of the linear Granger 
causality when compared to the OLS based tests in the unconditionally heteroscedastic 
case. Indeed from our simulation results the ALS tests have a better control of the 
error of first kind and a greater ability to detect the linear causality in mean than the 
OLS based tests in our framework. As expected we found that the standard Wald test 
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is not reliable for the test of autoregressive parameter restrictions when the process is 
stable but not stationary. 

7. Illustrative example 

Now we turn to an example taken from U.S. financial data. An application to 
the quarterly U.S. balance on services and balance on merchandise trade in billions of 
Dollars, from January f . f 970 to October 1, 2009 is considered. The series are available 
seasonally adjusted from the website of the research division of the federal reserve bank 
of Saint Louis: www.research.stlouisfed.org. The series are plotted in Figure [7751 

In a first time we study the properties of the processes. The existence of a unit 
root for each series is tested using the procedure proposed by Beare (2008). The 
Augmented Dickey Fuller (ADF) statistic is 3.05 for the merchandise trade balance 
data and 7.62 for the services balance data. These statistics are greater than the 5% 
critical value -1.94 of the ADF test. Therefore the the stability hypothesis have to 
be rejected for both series. In addition we also considered the Kolmogorov-Smirnov 
(KS) test for homoscedasticity proposed by Cavaliere and Taylor (2008). We found 
that the KS statistic is 3.05 for the merchandise trade balance data and is 7.62 for the 
services balance data. Since the KS statistics are greater than the 5% KS critical value 
1.36, the homoscedasticity hypothesis is rejected for the studied series. Hence the first 
differences of the series (plotted in Figure [779)) are considered in the sequel. The length 
of the series is T = 159. 

We fitted a VAR(l) model to the series. The OLS and ALS estimators of the 
autoregressive parameters are given in Table[Sl where the standard deviations obtained 
using the results p.2p and p.3|) are given into brackets. The standard deviations 
obtained using the standard result p.4|) are also given to illustrate the case where the 
practitioner suppose that the processes are homoscedastic. In accordance with our 
theoretical results we find that the ALS estimators are more precise than the OLS 
estimators. In view of the results of the KS test one can conclude that the standard 
deviations based on the homoscedasticity assumption are not reliable. If a single 
bandwidth is used for the ALS estimation, b = 7.67x 10^^ is selected by cross-validation 
in a given range and using 200 grid points (Figure [7.10p . If several bandwidths are used 
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for the ALS estimation, bn = 1.11 x 10"^, 612 = 1.87 x 10"^ a^-^(J 5^2 = 1.33 x 10"^ are 
selected in a similar way. From Figure 17.111 the ALS residuals seems homoscedastic. 
Therefore we can deduce that the unconditional heteroscedasticity is well estimated 
by the adaptive procedure. We also considered the ARCH-LM test of Engle (1982) 
with different lags in Table IH] for testing the presence of ARCH effects in the ALS 
residuals. It appears that the null hypothesis of conditional homoscedasticity cannot 
be rejected at the level 5%. These diagnostics give some evidence that the conditional 
homoscedasticity assumption on (et) is plausible in our case. However the OLS residuals 
are clearly heteroscedastic. It is well known that the test of Granger causality in 
mean strongly depend on the specification of the autoregressive order. Since the 
standard Box-Pierce procedure is not valid in our framework, we use the modified 
portmanteau tests proposed by Patilea and Raissi (2010) to check the goodness-of-fit 
of the VAR(l) model. We use the Ljung-Box statistic. The tests based on the OLS and 
ALS estimation are respectively denoted LB^'"^ and LB:^^^^ . We also give the result 
of the standard Ljung-Box test denoted LB^ to illustrate the testing procedure of the 
Granger causality in mean using only standard tools. The number of autocorrelations 
used for the portmanteau test statistics is ni = 5 and 15. From Table [7] it appears that 
the modified portmanteau tests do not reject the hypothesis of uncorrelated OLS and 
ALS residuals. The unreliable standard portmanteau tests clearly reject the hypothesis 
of uncorrelated OLS residuals, so that the practitioner is likely to select a greater 
autoregressive order. Note that over specified VAR model can entail a loss of efficiency 
in the test of Granger causality in mean. The reader is referred to Thornton and Batten 
(1985) for a discussion on the model selection for testing Granger causality in mean. 

From the above analysis it appears that the linear dynamics of the series are well 
described by the VAR(l) model. Hence we can now analyze Granger causality in 
mean, for instance, from the balance on services to the balance on merchandise trade. 
In Table |5] we see that the p-value of the Ws test is close to zero, so that the null 
hypothesis of no Granger causality is clearly rejected. However we can remark that the 
p-values of the Wals £md Wqls tests are far from zero. Therefore the null hypothesis 
is not rejected by the modified tests. These contradictory results can be explained by 
the fact that the Ws test is not intended to take into account the probable presence 
of unconditionally heteroscedastic errors in the data, on the contrary of the modified 
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tests which have a larger theoretical basis. It can be also noted in Table IH] that the 
different test statistics are quite different. 

Appendix: Proofs 

We first state some intermediate results. Define the linear processes 

OO OC' 

^t = ^Q<, and Ct = E^^"?-' 

where the components of the C^'s and -D^'s are absolutely summable. The vector u^ is 
given by u^ ~ Ik <E) ut, where 1^- is the vector of ones of dimension k. Let us introduce 
Vt = vec {"dtCt)- The following lemmas extend results obtained in Xu and Phillips 
(2008) and Phillips and Xu (2005) to the multivariate case. 

Lemma 7.1. (a) If supi<t<T{\\ ^u ||2/i) < oo, 1 < /J, < oo, for all i e {1, . . . , d}, 

then we have supi<t<T(|| ^nt ||p) < oo. 

(b) If supKt<Ti\\ ^it Wifj,) < oo, 1 < 11 < oo, for alii G {l,...,d}, then we have 
supi<t<T{\\ "^jt \\i^J.) < oo for all j e {1, . . . , kd}. 

(c) If supi<t<Ti\\ ^it Wifj,) < OO, 1 < H < OO, for alii G {l,...,(i}, then we have 
supi<t<T{\\ "djt-i^it^iu-j'tuvt IIm) < oo for all j,]' ,1,1' G {l,...,fcd}. 

Proof of Lemma 17.11 For the proof of (a) we write 

oo oo 

3,1=0 j.l=0 

OO 

= ^(A®C,)(m?_,®<,)- 

Then noting that E \ u.^t-iUbt-j \^< {E \ uu-i ^^ E \ uu-j p^)^/^ < oo, we have for 
each component Vnt of Vt € BJ"^'^ 

/oo \'^ 

E I Vnt r = || Vnt r^< V II {Di ® Cj) lloo SUp || Uu-lUbt-j |U < OO, (7.1) 

\j,i=Q .,fce{i,...,d} J 

where the norm || . ||oo is given by || F ||oo= niax^ ^ ■ |/ij| for a square matrix F 
with obvious notations. Relation (|7.1[) is hold since J2Ti=o II (^' ® ^j) lloo< X^^i^o II 

A llooll C, |U= {Ez=0 II A llocj {E,°:o II Q Hoc} < OO. 
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For the proof of (6) we write 



\ 4/. 



E I ^,t \'^^\ ^jt \\Z< E II ^' II- «^p II "^■*-' IUm < oo. 



^/=o 



from the Minkowski inequality, so that (6) hold. 
For the proof of (c) we have 

Ij ■djt-i'dit^iuytui't |ijl= E I ■djt-i-dit-iujftui't T 

< {E I ^,,_i |4^ E I ,?,,„! 1^'^ i? I u,,t \^^ E I uvt 1^'^)'/' < oo, 

from the Cauchy-Schwartz inequality, so that (c) hold. D 
Lemma 7.2. Under Al we /laue 



hm E 



^[r.]-iC[W]-il = E C, {Ifcx, ® S(r)} i?:. 



(7.2) 



/or values r G (0,1] ai which the functions gij{r) are continuous, and where l^xg is 
the matrix of ones of dimension k x q. 

Proof of Lemma 17.21 Let us define the vector e^ = Ik (^ £t- Using the well known 
identity {B (E) C){D ^ F) = (BD) (E) (CF) for matrices of appropriate dimensions, we 
have 



E[^t-iCt-i] = E 



E 



[ i^O ) I i-0 

oo 

i=0 

= J2 ^'^ [(Ife ® Ut-r-l){lq ® Ut-r-l)'] D 
i=0 
oo 

= E C.^ [(Ifci;) ® {ut_,.lUt-^-l)'] D[ 
i=0 

oo 

i=0 

oo 

= ^a{ifexq®s((t-i-i)/T)}i5^ 



i=0 
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Now let us write[j 

E[d[Tr]-iC'Tr]-i] = ^ C. {Ux , ® S(([rr] - ^ - 1)/T)} i^^ 

oo 

+ Y. a{lk.q®m[Tr]-t-l)/T)}D',, 

i—m-\'l 

with m = m{T) -^ oo and m/T -^ 0. Therefore noting that j:{{[Tr]-i-l)/T) -^ E(r) 
as T — > oo, with i < m, we obtain 



5Z a {ifcxg ® s(([rr] - ^ - i)/r)} i?^ ^ 5Z ^' {ifcx? ® ^W} -^^ 

i=0 4=0 

Since we assumed that sup^g/Q j^i gij{r) < oo we also have 

oo 

^ C,{lk.,(^m[Tr]-t-l)/T)}D',^0 

as T — !► OO, so that we obtain the result (|7.2p . D 

Recall that we have defined vt = vec (iStC't)- We also introduce yt = vec {'dt-ii!^t-i ' 
T.^^utu^Y.^^) and zt ~ vec (?9f_ii?j_;^ (g) Utu'^). 



Lemma 7.3. Under Al we /laue 

r-i^z;,A hm T-iJ^i^K). (7.3) 



T T 

P 



t=l t=l 



T T 

P 



T-i^2/i^ lim r-i^ii;(yO= lim T"! ^ «ec {i?(,?i_i,?;_i) ® E"!} . (7.4) 



T^OO ^ ' T-!-00 

t = l t = l t = l 

T T T 

p 



T-1 J2 ^t ^ lim T-i ^ ii;(zO = lim T'^ ^ «ec {Ei§t-i^',_,) ® S,} . (7.5) 



T^oo ■■= — ' T->oo 

t=l t=l t=l 



Proof of Lemma [7731 We first show that (i^t) is L^-Near Epoch Dependent (NED) 
on (et). We write for / > 

oo I oo 

1=0 1=0 i=l+l 



THere we make a common abuse of notation because in Assumption Al the matrix-valued function 
S(-) is not defined for negative values. To remedy this problem it suffices to extend the function E(-) 
to the left of the origin, for instance by setting S(r) equal to the identity matrix if r < 0. 
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Let J-^Zi be the cr-field generated by {ut_;, . . . , Ut+j}, then 

I oo 

i=0 i=l+l 

From the Minkowski inequaUty we have 



II du - E{d,t I Tl+\) h < j E II ^'+» II- f ^'^P II "j*-'-* II2 

+ I E II Q+. Hoc I sup II E{u,t^i^, I Tl+l) II2 
Since we have 

II «,,_,_! II2 = iE{ui_,_,)y/^ 

> {EiEiuu-i^,\Tlll)')}'^\ 
from the conditional Jensen inequality, and we obtain 

II ^.* - E{^,t I ^*+/) II2 < 2 I E II Q+, lU I sup II u.t^i., II2 . 



. i=l 



*,J 



Therefore noting that we have supj ■ || Ujt^i^i ||2< 00 and since X)i^i II ^i+i Ijoo^^- 
as I — >■ 00, it is clear that {-dt) is L^- NED on (e^). Similarly it can be shown that (Q) 
is L2_ NED on (et). 

From Theorem 17.9 in Davidson (1994), it follows that the process {vt — E{vt)} is 
L^- NED. Therefore since we assumed that et is a-mixing, and using Theorem 17.5 in 
Davidson (1994), {vt — E{vt)} is a L^-mixingale on {et). In addition using Lemma 1 7. II 
with fi ^ 2, we see that vt is uniformly integrable. Then from the law of large numbers 
for L^-mixingales of Andrews (1988), we obtain (|7.3p . 

For the proof of ((7^ note that vec {■dt-i'&t^i <E) J::,r^utu'f.j::ir^ - E{dt-i'd't^i <E) 
E(~ UjUjSj" )} are martingale differences and uniformly integrable from result (c) of 
Lemma [7. II Therefore we obtain 
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r-i Y, ^t^i^Li ® ^^'utu',^^' A lim T-i Y, i?(^,_i^;„i) ® s,-' 

•^ — ' T— >oo -^ — ' 

t=l t=l 

using similar arguments to that of the proof of (|7.3p . The proof of (|7.5p is similar to 
thatofdUl). D 



Lemma 7.4. Under Al we /laue 

T „1 oo 



t=i -^0 i=0 



/n addition we also ha 



where 



='2 

Jo 



(7.6) 



r"'^^t-i^;_i®/d A / ^{C,(lfcxfe®S(r))Carfr®/d. (7.7) 



r-5 ^ vec (Er^Uti^Ui) =^ AA(0, Si) (7.8) 

t=i 

T 

T-^^z;ec(7/tz?;_i)^AA(0,S2), (7.9) 



„1 oo 

Si = / ^ {C,(lfexfe <» 5](r))Ca <» E(r)-idr, 

and 

5] {^(Ifcxfc ® S(r))Ca ® S(r)dr. 

Proof of Lemma 17.41 For the proof of (|7.6p we have from Lemma 17.31 

T T 

r-1 ^ ^,_i^;_i ® E-1 A ^linr^ T-i ^ i?(,9,_i<_i) ® S,-^ 

Now let us denote the discontinuous points of the functions gij{-) hy S,i,^2, ■ ■ ■ ,^q where 
g is a finite number independent of T. We write 
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T 

lim T-iy s(^t_ii^;_i)®s,-i 

t=l 

T ,(t+l)/T 
= ^1™ ZZ / E{&[Tr]-l^[Tr]^l)®^TrA + Opil) 

.(T+l)/T 
■■■+ / E{d[Tr]-l^[Tr]-l)(^^[Tr]dr + Opil), 

SO that using Lemma 17.21 we obtain 



r-i^i^t_i<_i®E,-i= / ^{a(lpxp®SW)Q'}'^S(r)-idr + Op(l). (7.10) 

The proof of (|7.7p is similar. For the proof of (|7.8p . using the identities vec(a6') = 6(X)a 
and vec (BJF) = {F' (g) B)vec (J) for matrices B, J, i^ of appropriate dimensions and 
vectors a, b, we write 



vec {St ^t-dt-i} = (Idp®Et ^)(i?t_i®ut). 
Then using again the identity (B (g) C){D (g) F) ^ {BD) ® (CF) we have 

vec (Sr'^t<-i)vec (S,-'wt<-i)' = i^t-H^Ui ® S.-^utU^S,-'- 

Therefore we obtain the result (17.81) from (|7.4p . (|7.6p and by the Lindeberg central 
limit theorem. Using (|7.5p . the proof of (|7.9p is similar to that of (|7.8p . D 

Now we have the ingredients for proving our results. 

Proof of Proposition 13.11 1) and 2). For the proof of (|3.2p we write using (|2.ip 
and (122]) 

T5(^GL5 - ^o) = S^'vec (Sxi,), (7.11) 

with 

T 

t=i 
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Since we have Xt — X)i=o ^i'^^-i^ i^ follow from Lemma 17.41 that 

% = / 51 {^^C^py-P ® S(0)^^} ® 5](r)-idr + Op(l) = Ai + Op(l). 
Using (|7.8p we obviously have 

Sxn^AA(0,Ai), 

so that we obtain the result p.2p . 

For the proof of (|3.3p we write similarly to (jT.lip 

T^OoLS - Oo) = E^^vec {±xu), 
with 

T 

From ([7?7)) and ([7^ we write 



^x 



I OO 





5^ {^^(lpxp (» S(r))V4} dr (» /d + Op(l) = A3 + Op(l), (7.12) 



i=0 



and 



with 



vec(Ex„)=^AA(0,A2), 



A2=/ ^{^'.(lpxp®S(r))^^}(^I](r)dr, 

SO that we obtain the result p.3p . 

In this part we show that A3 is positive definite. To this aim it suffices to show that 
the matrix A3 = X)j=o '4'ii'^pxp ® S(''))'0i is positive definite for all r. Let us consider 
a pd-dimensional vector A 7^ 0. If A3 is not positive definite we have 

00 CO 00 

J2 AVz(lpxp ® ^{rM>^ = E ^^^^ = E ^? = 0' 

i=0 4=0 1=0 

where A^ = {X[ijJiG{r), . . . , A' ^i_p+iG'(r)) with obvious notations. Therefore we have 
Ai = for all i e N. First consider Aq. In this case we have ipo ~ Id and V'-i — "0-2 ~ 
... = Tpi-p^i = 0. Since we assumed that S(r) is positive definite we can deduce that 
Ai = 0. Similarly Ai implies that A2 = 0, A2 implies that A3 = and so on. Thus 
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A = 0, which shows that A3 is positive definite. Using similar arguments and since 
the Kronecker product of two positive definite matrices is positive definite, it can be 
shown that the matrices Ai and A2 are positive definite. 

3) Using the Cholesky decomposition for positive semidefinite matrix we can write 



00 



4=0 



and let 



-Bfc(r)={Z(r)®I]^-3/2(r)}', fc = l,2 r £ (0, 1] 
Then, by the properties of the Kronecker product we have 

Afe= / Blir)Bk{r)dr, fc-1,2. 



and 



Define 



A, 



1 .1 

B'^{r)Bi{r)dr = / B[{r)B2{r)dr. 



A 



B^(r)B2(r)drj / B^(r)Bi (r)dr = A^^Ag. 
/o J Jo 

Following the idea of Lavergne (2008) we can write 



< 



/ {Bi(r) - B2{r)K}'{Bi{r) - B2(r)A}dr 

Ai-A' ( B'2ir)Bi{r)dr - f B[{r)B2{r)dr A + A' A2A 
Jo Jo 



Ai - AsAr^A 



2 ^^3 



and this prove the stated result. Notice that the equality between the two asymptotic 
variance holds if and only if Bi{r) = B2{r)A for almost all r G (0, 1]. D 

Proof of Proposition [3721 For the proof of p.Sp we write 






= T-'Y.^,u',- 



Y.\T-'Y.^tX[_\{AO^'-A.,)' 



Y^{AO^^-A,)\t-'Y.^,^^< 



,i=l 
P 



t=l 
T 



^(i?^^ - A,) T-i Y. ^*-^*-^ i^?'' - ^0' 



i=l 
Cl + C2 + C3 + C4, 



(7.13) 
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with obvious notations. Using similar arguments to that of the proof of (|7.9p . it is easy 
to see that we have 

T T 

T~^Y.''^^t-^ = Op{T-i) and T-i^X,'_,ii, - Op{T~i). 
t=i t=i 

Then since Af^^ — Ai = Op{T~^), we write C2 = Op(l) and C3 = Op(l). From 
relation (|7.7p . it is also easy to see that we have C4 = Op(l)- Let us define wt = 
vec(utMj — vec{G{t/T)G{t/T)'). Since {wt, J"t„i} is a-mixing by Theorem f4.1 in 
Davidson (1994), and E \\ wt |P< 00 by Al. we have by the law of large numbers for 
L^-mixingales 



T T 

T-^Y.^ec{utu't) - ^lim^T-i^S{vec(utw;)} + Op(l) 
t=i t=i 



= Ji?^ ^"' E vec(G(i/T)G(i/T)') + 0^(1) 

= vec / S(r)dr + Op(l), 
Jo 

and we obtain p.Sp . For the proof of p.7p we have similarly to (|7.13p 



T ^'^Ut-lu't_i(E)Utu't=T ^ y^ Mf_lMt_;^ (g) Ufu'f. + Op(l). 
t=2 t=2 

From the Cauchy-Schwartz inequality and by Assumption Al we have 

E I u,t-iu,t_iuktuit r< {Eiuu^if''E{u,t^i)^''E{ukt)''''E{uit)*''}-^ < cx). 
Then using again the law of large numbers for L^-mixingales and since 

E{ut-iu't_-^ (g) utu't) ^ E{ut-iu't_-^ (g) E{utu't \ Tt-i)) = ^t-i ® ^t, 
we write 

T T 



T ^^ut-iUt_i<gutUt = lim T ^ ^ I]t_i » Et + Op(l) 

t=2 t=2 

T 

lim T-^ Y^ T.{t - 1/T) ® Ut/T) + Op(l). 



T->oo 

t=2 
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Filially noting that 

lim T-i ^ I](i - l/T) ® T.{t/T) = / S(r)®2dr + Op(l). 

+ — n "^ 



t=2 



we obtain dXT]). The proof of (jXTU)) follows from (fTT^ . For the proof of dnH), we 
write as above 

T T 

t=i t=i 

so that we obtain the desired result from similar arguments used for the proof of (j7.8p . 

n 



Proof of ([3^^]) . ([31^ and ([JJ^ We only prove ((3l^ . The proofs of ((3ll|) and 
(|4.ip are similar. From the proof of Theorem 13. II we have 



S(^^) 



S_^=A3+Op(l). 
Using Lemma 17.31 we also obtain 

T 

vec {t^} A ^i^^T-^ J2 ^'^^ {E{Xt-iX't_^) /4- 

°° t=i 

Straightforward computations show that 

CO 

vec {£;(Xt_il,'_i) ® /4 = ^{(A ® /d)®'}Vec 

1=0 

Then considering similar arguments used in the proof of Lemma 17.21 we write 

lim vec {^(X[Tr]-l-^[Tr]-l) ® Id} 

so that we obtain 

J^ i:{r)dr (g> Id 



(7.14) 



T 





vec 



S(r) ® /d 




vec{S^} = {V2)2-(A®/d)®2| 1 



vec 



0,(1). 



by using similar arguments of the proof of (|7.6p . Therefore the result p.l2p follow 
from ([714)) . D 
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Proof of Proposition [47l1 In the following, c, C, ... denote constants with possibly 
different values from line to line. First, let us focus on the asymptotic equivalence 
between 9als a-nd Ogls uniformly w.r.t. the bandwidths bki G Bt- We extend the 
arguments of Theorem 2 in Xu and Phillips (2008). Consider the notation 

T T 

^(r) = r-i^x,„ix;_i®r,-\ and a(r) = T-i/2^rr^«tl,'_i. 

Then 

Vt{0als - 0GLs) = Aity'vec (a(S)) - A{Ey^vec (a(E)) 
= A{t)-^ {vec (a(S)) - vec (a(I]))} 

-A(I])-i {A{t) - A(I])} A(S)-ivec (a(S)) . 

P 

By our result (|7.6p . A(I]) — ^ Ai which is positive definite. Moreover, a(S) is bounded 
in probability by Markov's inequality. Lemma l7.H -a) considered with /i > 2 and the 
linear processes 'dt = Ut and (t = Xt-i, and the fact that T,^^ is bounded. Hence, like 
in the proof of Theorem 2 of Xu and Phillips (2008), to prove that Vt(6als — (^gls) = 
Op(l), uniformly w.r.t. bki G Bt-, it suffices to check 

A{t) - A{T,) ^ Op{l) and a(S) - a(S) = Op(l), (7.15) 

uniformly w.r.t. bki G Bt- As a direct by-product we also obtain Ai — Ai = Op(l) 
uniformly w.r.t. the bandwidths bj^i- Let us define 

T T 

'Et=^^Wti'^u^u[ and St^^WijOE^, (7-16) 

and, following Xu and Phillips (see also Robinson, 1987), notice that the results in 
()7.15p are consequences of the following eight rates obtained uniformly w.r.t. bki G Bt'- 
(a) a(E") - a(E) = Op(l); (a') a{t) - a(EO) == Op(l); (b) a(E) - a(S) = Op(l); (c) 
a(E) - a(I]) = Op(l); (d) A(E0) - A{h) = 0^(1); (d') A(E) - A(E") := Op(l); (e) 
a(j:) - A(i;) = Op(l); (f) A{t) - A(i;) = Op(l). In this proof the norm || • || is 
the Frobenius norm which in particular is a sub- multiplicative norm, that is j|A_Bj| < 
||A||||_B||, and for a positive definite matrix A, \\A\\ < C[Xmin{A)]^^ with C a constant 
depending only on the dimension of A. Moreover, \\A^ B\\ = j|ylj|||i3j|. To simplify 
notation, let b denote the d{d + 1) vector of bandwidths bki, 1 < fc < ' < d. Below we 
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will simply write uniformly w.r.t. b instead of uniformly w.r.t. bki, I < k < I < d, and 
supfc instead of sup^^^ge^ i<fc<,<d. 

(a) Using the identity A^^ - B^^ = A^^{B - A)B^'^ we can write 



im - a(E) = r-i/2 ^ (S?)-^ {L -S?} C utXU 



t=i 



Take the norm on the right-hand side and apply Lemma I7.6r f.h.il. Cauchy-Schwarz 
inequality and the fact that T^^ 12t=i ll^t^t-iP ~ Op{l) by Lemma [TTTJ-a) . 

(a') Use the same decomposition to write 



(SO) - ait) = r-V2 ^ (sO)-i {t, ^ t1} t-^HXU (7.17) 



t=i 



Now, if jl • JI2 denotes the spectral norm, use the inequality 



\B'/' A'f% < \ [max{|l A-i|b, \\B-^h}] "' \\B A\[ 



(see for instance Horn and Johnson (1994), page 557), and deduce that 



|S?-St||2< 






{nf + ^Th 



-1 


2 




\n)\ 


-1 


j] 



1/2 



Now, if r e (0, 1] and Ap-r] — B ^ Op(l) with B positive definite, it is easy to check 
that 11^1^^112 < {l + Op(l)}||-B^^||2- Use Lemma [731 and Assumption Al'(i) to deduce 
that the spectral norms of [(S") + UTld]^^ and [(S°) ]^^ are bounded in probability. 
Finally, take spectral norm on the right-hand side of ()7.17p . use the fact that vt ~ 
o(r-i/2) and deduce (a'). 

(b) Consider the identity 



A-^ - B-^ = B-\B - A)B-^ + B-\B - A)A-\B - A)B-^ 
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and write 



T 

T _ _ ^ ^_x 

^j,-i/2^^-ip^_ St] E* Pt- Et]E,-i7/tX,'_i 



t=i 

T 



T-V2^Ai,(6)+T-V2^A2,(&) 



i=l t=l 

-: Ai(&) + A2(&). 

Note that by equation (22) in Xu and Phillips (2008), 

{Au{b),Tt} = {S-ipt- ht]^^'utX^^,Tt} 

is a martingale difference (m.d.) sequence indexed by the bandwidths b\j To prove 
that Ai(fe) = Op(l) uniformly w.r.t. b we show that this uniform rate holds cellwise. 
For this purpose it easy to see that it suffices to prove that 



1 ^ 

^Tih) = -y=^etUJ^Wt^{h) = Op(l) 



T 

T 
^ t=i 

uniformly w.r.t. h G Bt where {£t,J-t\ and {aJt,J-t} are univariate m.d. sequence 
satisfying suitable moment conditions and 

svi]i E{e1 + uj"^ \ Ft-i} <oo. (7.18) 

t>i 

More precisely, StOJi could be any cell of 



-f It is important to notice that for a fixed bandwidth the sequence (Ait(fe)) is not adapted to 
the filtration (.Ft). As a consequence, the expectation _E{Ait(6)'Ais(b)} is not necessarily zero and 
therefore the equality _E{|| Ai(fe)|p} = T~^ 5I]t=l ^{ll^it(^)lP} does not necessarily holds. 
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Using the Inverse Fourier Transform and a change of variables, we rewrite 

T 

Srih) 



1 ^ 

j^ J2 Stt^J^'it/T; h)K{{t - i)/Th) - Arih) 



tVt h 



T' 



u=sh 1 ) ^ \~^ 



Ttt Mt/T-h) \ T , 



{;^E^^(i + Nr^< 



exp —2'K\/—1 — s 



T J I (1 + kl) 
T[K]{sh) 



J-[K]ish) , ^ ,^. 
J^_^ds-AT(/i) 



where 



T 



TVThfr^ 



fT{t/T;h) = T ^h '^^j^iK{{t — j)lTh) and t is any (arbitrary small) positive 
constant. It is easy to see that Assumption Al'(i) and inequality (|7.24p imply 



sup |At(/^)| = Op(T-i/26-i) = o.p{l). 

heBr 



Since 



;^ /. imP "- ^;w- /.""'''■><•"' ^^ ^ ° (K^ 



provided r is sufficiently small, it suffices to prove 

T 



-1/2 



sup sup 



--=^5it(/i,.s) 



Op(l) 



and 



sup 



1 



Op(l) 



(7.19) 



(7.20) 



Let us notice that 

T - '-3+1 

fT{t/T;h)^h-'Y.LJ 



j = l T 

For < c-L < ^ < c::L define 






z=r/h /■ Th / [Tzh] 

K 

*-£ V Th 

Th ^ 



dz. 



fT{t/T;bT/^)^ I ' K{z)dz. 

Thrj 
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Then, if K{-) is differentiable, for any 1 <t <T and h e Bt 



fT{t/T-h)-fT{t/T-h) 



< 



< 



< 



K (z) - K 



[Tzh] 



Th 



— CO J Z 



Th 
K'{v)\dvdz 

\K'{v)\dvdz 



dz 



\K'{v)\ 



dzdv < 



C 



When the K{-) is differentiable except a finite number of points, the same type of upper 
bound can be derived after minor and obvious changes. Hence, with the notation 



Sit{^,s) 



et{l + \s\)-^ 



t 



exp 27rV^-s , {)<c^\,<'d< 



since for any real numbers a and b, a^^ = b^^ + (b—a)/ab and knowing that /t(-; bT I'd) 
and fT{''ibT I'd) are uniformly bounded away from zero (see inequality (|7.24p below), 
we obtain 



sup sup 

■a s 






Therefore is suffices to prove 



sup sup 






Op{l) 



(7.21) 



in place of ()7.19p . For proving ()7.20p and ()7.2ip we use a uniform CLT for m.d. indexed 
by a class of functions, see Bae et al. (2010), Bae and Choi (1999). Here our indexing 
classes functions depend on c~^^ < ^ < ^min ^^^"^ s G M. respectively on ,s e M, and we 
prove that their covering numbers are of polynomial order independent of T. Now we 
can explain the unique role of the (1 + \s\Y function: it cuts the high frequencies of the 
complex exponential function and allows one to obtain polynomial covering numbers. 
Consider the family of functions Fn = {ipii{-;'d) : [0, 1] -^ [0, 1] : c„^^ < "(? < c^i„} 
and J"i2 = {(<5i2(-; s) : [0, 1] ^ C : s e R} where 

V'iiir;d)=fTir;bT/^) = FK{^r/bT) ~ FK{^{r ~ l)/bT), Vi2{r-:S) = ^^j-^--—^, 

(i + \s\y 

where Fk{-) is the cumulative distribution function associated to the density K{-). By 
Lemma 22-ii) and Lemma 16 of Nolan and Pollard (1987), the class Fu is a 1/C— class 
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(also called Euclidean) for a constant envelope. The yC— property for T\2 is proved 
for instance in Lopez and Patilea (2010). 

Now we check the conditions of Theorem 1 of Bae et al. (2010) in order to derive 
([7?^ and ([7?^ . For simplicity, we only provide the details for ([7?^ . With the 
notation of Bae et al. (2010), j = t, n ^ j{n) = T, Srt = J"*, X = M x [0,1], 
Vrtif) = r-i/2£,^,2(i/T;s)^n(i/T;^)-i with {d,s) eT = [c^^l^.^^^L] x «: the 
family T being composed of the functions f{e,r) = eLpi2{r; s)ipii{r;d)~^ , (i?, s) G T, 
with envelope F{e,r) = Ce for some sufficiently large constant C. Moreover, define 






2^ 

dr 

Notice that supQ^^^j^ E{e?rpi) < C for some constant C, </3ii(-; i?) is uniformly bounded 
and bounded away from zero (see (|7.24p ). ipi2{-] s) is uniformly bounded, and 
^ /' 'Pi2{r;si) _ (pi2{r;s2) Y 

< Ci(i?i-^2)' / {Kir/c^axbTy/brV +{Kiir-l)/c„,a.bT){r-l)/bTf dr 



Jo 

+C2 {friir; Si) - (pi2{r; 82)}'^ dr 
Jo 

< C3{6t(^1 - ^92)' + (.si - S2)2}, 

V(i?i, Si), (??2> S2) G T, for some constants Ci,--- , C3 > 0. Moreover, for any p > 
there exists Cp > such that L 'fi\2{''"^ •s)'^'' 1^ P,^s > Cp. Using these properties, on one 
hand we check that the pseudometric space (T, d(-, •)) is totally bounded and, on the 
other hand, we check that condition (2) of Bae et al. (2010) holds for some sufficiently 
large L given that the conditional variance of St is deterministic and bounded. The 
convergence to zero for i„((S) in Bae et al. (2010) is a direct consequence of our 
unconditional moment conditions on et . The uniformly integrable entropy condition is 
ensured by the X^C— property satisfied by the classes J^n and J- 12 and the finite second 
order moment for et. Now. all the required ingredients are gathered to apply Theorem 
1 of Bae et at (2010) and to deduce our property (|7.2ip . 

For the uniform Op(l) rate of A2 take the norms and apply Lemma \7.6\ c.e.i) and 
the moment assumptions. 

The results (c) to (f ) are obtained by obvious adaptation of the corresponding proofs 
in Xu and Phillips (2008), hence the details are omitted. 
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Finally, to derive the result for (li, use again the identity A^^ — B^^ = A^^{B — 
A)B^^, the inequality \\A (g) B\\ = ||Aj|||i3j|, the triangle inequality and apply Lemma 
[Z:Se,g,h,j). D 

Lemma 7.5. Let gki{r-) = Yi\TLf\r gki{f) and gki{r+) = linifir gkiir), for r G (0,1] 
and 1 < k,l,< d. Define the d x d— matrices G{r—) = {.9/£;(?'— )} and G(?'+) = 
{gki{r+)} andJ:{r-) = G{r-)G{r-y , S(r+) = G{r+)G{r+y . SetJ:{l+) = 0. Under 
the assumptions of Proposition \4-i\ 

K{z)dz + Y.{r+) / Kiz)dz, 

-oo Jo 

uniformly with respect to r ^ (0, 1]. 

Proof of Lemma l7.5I It suffices to notice that equation (19) of Xu and Phillips (2008) 
can be obtained uniformly w.r.t. r € (0, 1] and bki G St, 1 < k <l < d and to prove 

sup sup {|lA[Tr.]|l + ||B[Tr]!|} = Op(l), 

bkieBT.l<k<l<dre(0.l\ 

„ o o _ o _ 

for v4[Tr] = ^fTrl^ "^[Tr] ^-iid i?[Tr] =Ti[Tr] ^^[Tr\ wherc Yit and Ef are deffired in 
equation (|7.16p . The uniform convergence of An^^^ and -BrT^l is easily obtained from 
Lemma rn^ d.gV D 

In the following lemma, which is an extension of the statements (d) to (1) in Lemma 
A of Xu and Phillips (2008), we gather some results used in the proof of Proposition 
14.11 Let Wti^ki = Wti{bki) denote the element kl of the dxd matrix wu that is a function 
of the bandwidth bki ■ 

Lemma 7.6. Let \\ ■ \\ denote the Frobenius norm. Under the assumptions of Proposi- 
tion \4.1\ 

(a) For allT>\ and 1 < k < I < d, 

T 
max y^ sup wti^ki < C < oo, 

for some constant G . 

(b) For all T > I and 1 < k < I < d, inaxi<t,i<T sup^^^gg^ Wu^ki < G/Tbx for some 
constant C > 0. 
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(c) For all T , inf(,ggy mini<t<T Xmini'^t) > C > for some constant C. 

(d) As T ^oo, 

max £; f sup II L -tt\A = Op((1/(T6t)') • 

l<t<T Vfcei3r / 

(e) For (5 = 1,2, as T — > cx3, 
max sup II Et -SJ'' = Op{T-^/%^^^). 



l<t<T 



beBj 



(f) AsT^<^, 



(g) AsT^rx), 



(h) As T ^oo, 



inf mm A,„i„(St) = Op(l) 

beBr l<t<T ^ ' ( '"- ' 



max sup ||E?- E* || = OpiT-^^^^^'^). 



l<t<T 



beBi 



inf min XminC^t) ] = Op{l). 

beBr l<t<T ^ J ' 

(t) AsT^^, sup.^e^ Ef=i lis?- Lr^ Op{T-^b^') 
0) AsT^cx^, sup.ee. T-i Ef^, IJSt - EJ = o(l). 

Proof of Lemma 17.61 fal Using the monotonicity of K{-) we can write 

J2i=l K[[t - i)/bTCmax) 



max > sup 



WtiM < max 



b., GBt 1<*<^ Er=l ^^((* - i)/bTCm^m) " /^(O) ' 



(7.22) 



(7.23) 



Now. using again the monotonicity of A'(-) and adapting the hnes of Lemma A(c) in 
Xu and Philhps (2008), for any h e Bt and any 1 < t < T, 

T / , .\ T 

Th 



Th /-^ 



Th 



t-i+l 
Th 



J2I^ -'^ Kf^I^]dz< I max 



Th 



i^W,A-(z-i^ 



dz < 2. 



i=l ^ ' 1 = 1 Th 

This allows to control the numerator on the right-hand side of (|7.23p . On the other 
hand, using similar arguments and the fact that K(0) > 0, for any h g Bt, any 
1 < t < T and any < 71 < 72 < oo, 

T 

Th 

i=l '■ ' " Th 

-71 



^E-^^. > 



Th 



t-T 
Th 



^(^),i^(-^ 



dz 



> min 



K{z)dz, / K{z)dz 
■72 •'71 



72 



(7.24) 
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provided that T is sufficiently large. The last two integrals in the minimum are strictly 
positive for a suitable choice of 71, 72. This fixed lower bound considered for h = CminbT 
allows to control the denominator on the right-hand side of (|7.23p and thus to prove 
(a) with a constant depending on 71,72 and Cmax/cmin- 

(b) For aU 1 < A; < / < d, 



Wtihl < 



1 K ( *-' ^ 

1 r^ k( *-■' ] 



Now, use the fact that K is bounded, Cmax/cmin < 00 and Lemma A(c) of Xu and 
Phillips (2008) to derive the upper bound. 

(c) This is an easy consequence of Assumption Al'(i) and the proof of Lemma [7751 
equation (19), that holds uniformly w.r.t. r E (0, 1] and b^i G Bt, 1 < k < I < d. 

(d) Let ai{k,l) denote a generic element of the d x d— matrix Uiu[ — S^. Then we 
can write 



E I sup 

beBr 



T 

E 

?:=l 



Wti{UiU't - ^i) 



< £■ I sup 

beBj 




< cY.E 



k,l = l 



y^ sup wti,ki\ai{k,l)\ 



beB 



where c depends only on d. Now, by Lemma A(f) of Xu and Phillips (2008) and (a)-(b) 
above, for 1 < fc < Z < d 



/ T 



£■( > ' sup wti,ki\ai{k,l)\ < 



\i=l 



beBi 



max y sup 



Wu.kl 



b£Bj 






sup Wti,kiE\a^{k,l)\^< 



b£Bj 



{ThTY ' 



for c a constant depending only on K, Cmax / Cmin and the upper bounds of the 4th 
order moments of the components of Uiu[ — E;. Now, (|7.22p follows. 
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(e) By Markov's inequality and obvious algebra we can write 
P { max sup II St -tt\\^ > CT^^/%^^^ 



P { max_ sup II St -Sjf > C^/^T-^bj. 



^ II Ijt —^t\\ ^ L/ • J- Urr 



< C-'^I^ThlE max sup |1 St -St||^ 



<C-^"TbiyE[ sup ||St-Stf 

= C^*/*r6|r max £: ( sup 11 St -St||M 
i<t<T \beBT J 

= C-4/*0(l) 



where for the last equality we use (d). 

(f)+(h) Using equation (3.5.33) in Horn and Johnson (1994) and (e) above we have 

o _ o _ 

min A,„i„(St) > min A„ii„(St) - max A™„(St) - A„ii„(St) 

l<t<T i<t<T l<t<T 

_ o _ 

> min A„ii„(St) - sup max || St -St|| 

l<t<T 6eBTl<*<^ 

= min A„im(St) + Op(l), 

l<t<T ^ 

and hence (f) follows from (c). Similar algebra applies for (h) which will follow as a 
consequence of (g). 

(g) + (i) Adapt the proof of Lemma A(i) and A(k) of Xu and Phillips (2008) using a 
decomposition like in our equation (|7.13p . 

(j) Apply Lemma A(l) of Xu and Phillips (2008) componentwise, that is (P times. 

n 

Proof of ([573]) and ([STB]) . Let us denote 

S = i?A3"^A2A3"^i?' and Es = RA^gA2sA^sR'- 
From the expressions of Qols and Qpi^g we write 

I Qols - Q'ols \<T\\ Mols \?\\ S"^ - ^f |h Op(l), 
since T \\ ROqls |P= Op(l) and || S~^ — S^^ ||= Op(l) from the consistency of the 



estimators of A2 and A3 . In addition we write 

) + Qols _ 

2 



Qols + Qols _'^ Q/ p'r^-i 1 ■=■- ii pa 

OLS^i^ +^S \R^OLS- 
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Noting that {S-i + EJ^}/2 = RA^^A2A^^R' + Op(l), we have {Qols + Qqls}/'^ ^ 
Xpd d ■ Since max(a, h) = {a + b+ \ a — h |}/2, the result (|5.6p foUows from (|5.4p and 
The result (j5.3p can be obtain in a similar way. D 
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Tables and Figures 



Table 1: Empirical size (in %) of the difFerent Wald tests in the case of iid homoscedastic 
innovations. We take an = a22 = 0.2, a2i =0.1 and ai2 — 0. The errors are standard 
Gaussian. 



T 


50 


100 


200 


400 


WoLS 


8.3 
9.3 
9.5 


6.1 
6.2 
6.6 


5.5 
5.3 
5.6 


5.4 
5.4 
5.6 


Ws 


7.1 


5.3 


4.9 


4.9 


Wals 


12.4 
13.3 
13.5 


5.2 
5.5 
5.5 


5.3 

5.4 
5.4 


5.1 
5.1 
5.1 


Wgls 
WLs 
W^Es 


6.2 
7.6 
8.0 


4.9 
5.6 
6.0 


5.0 
5.4 
5.4 


4.5 
4.9 
5.1 



48 



Valentin Patilea and Hamdi Raissi 



Table 2: Empirical size (in 
with 7i = 20, p = 0.6, and 



%) of the diflterent Wald 
we take an = 022 = 0.2, 



tests. The innovations are heteroscedastic 
^21 = 0.1, ai2 = 0. 



T 


50 


100 


200 


400 


WoLS 

Kls 


8.8 
9.7 
10.2 


5.8 
6.5 
6.8 


4.8 
5.0 
5.0 


5.2 
5.4 
5.5 


Ws 


9.3 


8.1 


6.6 


8.0 


Wals 
Wi^s 


7.1 
8.3 
8.3 


5.5 
6.2 
6.3 


4.9 
5.6 
5.6 


4.8 
5.4 
5.4 


Wgls 
WLs 
W^Es 


5.2 
5.7 
6.3 


4.1 
4.0 
4.4 


5.2 
4.2 

5.4 


4.2 
3.4 

4.2 



Table 3: Empirical power (in %) 
homoscedastic. We take an — a22 = 



of the different Wald tests. The innovations are iid 
0.2, a2i =0.1. 



ai2 


-0.4 


-0.3 


-0.2 


-0.1 


0.1 


0.2 


0.3 


0.4 


WoLS 


98.3 


86.5 


53.3 


20.5 


18.1 


54.3 


84.5 


96.8 


W^r.s 


98.3 


871 


54.2 


21.4 


18.7 


56.1 


84.9 


970 


WSTs 


98.3 


87.5 


54.7 


21.9 


19.2 


56.5 


85.3 


97.0 


Ws 


98.1 


86.0 


53.2 


18.9 


17.0 


54.0 


84.4 


96.6 


Wals 


98.3 


86.6 


53.0 


19.8 


174 


55.7 


84.4 


970 


Wi^s 


98.3 


86.6 


53.7 


20.0 


178 


56.0 


84.9 


970 


w^l^ 


98.3 


86.7 


53.8 


20.2 


179 


56.2 


84.9 


970 


Wgls 


97.8 


85.6 


51.0 


18.1 


15.7 


52.4 


84.4 


96.9 


wLs 


98.7 


88.1 


53.0 


19.6 


175 


55.2 


85.8 


976 


w^rs 


98.7 


88.3 


54.2 


20.1 


18.3 


56.4 


86.4 


976 
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Table 4: Empirical power (in %) of the different Wald tests. The innovations are 
heteroscedastic with 71 = 20 and p = 0.6. We take an = 022 = 0.2 and 021 = 0.1. 



ai2 


-0.8 


-0.6 


-0.4 


-0.2 


0.2 


0.4 


0.6 


0.8 


WoLS 


96.9 


81.4 


48.1 


17.3 


14.2 


40.3 


70.0 


90.6 


W^i^s 


97.5 


82.4 


49.9 


18.2 


15.5 


41.5 


70.8 


90.9 


w^l% 


97.7 


83.4 


51.5 


18.7 


15.7 


42.2 


71.4 


91.3 


Ws 


98.4 


85.4 


53.9 


20.1 


17.1 


46.3 


75.5 


92.8 


Wals 


98.8 


86.7 


50.8 


17.7 


13.5 


45.2 


75.4 


93.0 


wi^s 


98.9 


87.8 


54.2 


19.4 


15.6 


48.4 


77.4 


94.5 


w^ll 


98.9 


87.8 


54.2 


19.4 


15.6 


48.4 


77.4 


94.5 


Wgls 


99.1 


88.7 


52.6 


17.7 


14.2 


48.0 


79.5 


96.1 


W'gls 


99.1 


89.9 


52.9 


18.1 


12.4 


46.9 


78.3 


95.1 


W^ts 


99.2 


90.4 


55.5 


19.0 


14.5 


49.6 


80.5 


96.1 



Table 5: The estimators of the autoregressive parameters of the VAR(l) model for the 
balance data for the U.S. 



Parameter 


01 


02 


03 


Oa 


ALS estimation 


0-33[o.o8] 


0-02[o.o2] 


-0.35[o.3o] 


-0.07[o.o8] 


OLS estimation 


0.45(0.23] 


0.00[o.o2] 


-l-02[o.6o] 


0.1[0.17] 


Standard estimation 


0.45[o.o7] 


0.00[o.o2] 


~l-02[o.37] 


0-1 [0.08] 



Table 6: The balance data for the U.S.: the p-values of the ARCH-LIVI tests (in %) for the 
components of the ALS-residuals of a VAR(l). 



lags 


2 


5 


10 


lit 


22.26 


45.05 


36.44 


£2* 


25.32 


73.32 


77.18 



Table 7: The p-values of the portmanteau tests (in %) for the checking of the adequacy of 
the VAR(l) model for the U.S. trade balance data. 



m 


5 


15 


LB^ 


0.00 


0.01 


LBg^' 


50.80 


99.94 


LBi^' 


6.36 


15.95 
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Table 8: The p-values of the Wald tests for Granger causahty in mean (in %) from the U.S. 
balance on services to the U.S. balance on merchandise. 



WoLS 


8.74 


Ws 


0.57 


Wals 


25.20 



Table 9: The statistics of the Wald tests for Granger causality in mean from the U.S. balance 
on services to the U.S. balance on merchandise. 



QOLS 


2.92 


Qs 


7.64 


Qals 


1.31 



2.5 



1.5 




0.2 

n 0.4 



Figure 7.1: The ratio Varas (j)2,OLs) /Varas {d2,GLs) of Example O 
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Figure 7.2: The coefficient ki of Example [SH 



RMSExlO^ 




-0.75 -0.5 -0.25 



0.25 0.5 0.75 



an 



Figure 7.3: The RMSE of the estimators of the parameters an over N = 1000 replications with 
varying an = a22, ai2 = and a2i = 0.1. The errors are honioscedastic and we take T = 100. The 
RMSE are displayed in blue for the ALS estimators, in green for the OLS estimators and in red for 
the GLS estimators. 
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an 



Figure 7.4: The RMSE of the estimators of the parameter an over N = 1000 replications with 
varying an = a22, ai2 = and 021 = 0.1. We take 71 = 20, p = 0.6 and T = 100. The RMSE 
are displayed in blue for the ALS estimators, in green for the OLS estimators and in red for the GLS 
estimators. 



RMSExlO^ 




-0.75 -0.5 -0.25 
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Figure 7.5: The same as in Figure [7^ but for ai2 = with varying a22 = an and 021 = 0.1. 
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RMSExlO^ 




-0.75 -0.5 -0.25 0.25 0.5 0.75 """ 

Figure 7.6: The same as in Figure [7741 but for a2i = 0.1 with varying 022 = cii and ai2 = 0. 



RMSExlO^ 




-0.75 -0.5 -0.25 0.25 0.5 0.75 

Figure 7.7: The same as in Figure [TTil but for 022. 



022 



54 



Valentin Patilea and Hamdi Raissi 




I 1975,3 1980-3 1935.3 1990,3 1995,3 2000,3 2005,3 ' 1975,3 1 9S0,3 1985,3 1990,3 1995,3 2000,3 2005,3 

Figure 7.8: The balance on merchandise trade for the U.S. on the left and the balance on services 
for the U.S. on the right in billions dollars from 1/1/1970 to 10/1/2009, T=160. Data source: The 
research division of the federal reserve bank of Saint Louis, www.research.stlouis.org. 




2000,4 2005,4 



Figure 7.9: The differences of the balance on merchandise trade (on the left) and of the balance on 
services for the U.S. (on the right). 
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Figure 7.10: The cross validation score for the ALS estimation of the VAR(l) model for the 
differences of the balance on merchandise trade and on services in the U.S.. 




992.3 1997.3 2002.3 2007.3 



Figure 7.11: The ALS residuals of a VAR(l) for the differences of the balance on merchandise trade 
and on services in the U.S.. The first component of the ALS residuals is on the left and the second is 
on the right. 
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Figure 7.12: The same as in Figure mil but for the OLS residuals. 



